\subsection{Gradient vector}
Consider a function \(f(x, y)\), and some small displacement \(\dd \vb s\).
We want to find the rate of change of \(f\) in this direction.
Recall that the multivariate chain rule tells us that a change in \(f\), given a change in \(x\) and \(y\), is given by
\begin{align*}
	\dd{f} & = \frac{\partial f}{\partial x} \dd{x} + \frac{\partial f}{\partial y}\dd{y}                         \\
	       & = (\dd{x}, \dd{y}) \cdot \left( \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y} \right) \\
	       & = \dd \vb s \cdot \grad f
\end{align*}
where \(\dd \vb s = (\dd{x}, \dd{y})\); \(\grad f = \left( \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y} \right)\).
We call \(\grad f\) the `gradient vector', in this case in Cartesian coordinates.
If we let \(\dd \vb s = \dd{s} \hat{\vb s}\) where \(\abs{\hat{\vb s}} = 1\), then we can write
\[
	\dd{f} = \dd{s} (\hat{\vb s}\cdot \grad f)
\]

We define the directional derivative by
\[
	\frac{\dd{f}}{\dd{s}} = \hat{\vb s} \cdot \grad f
\]
This is the rate of change of \(f\) in the direction given by \(\hat{\vb s}\).

\begin{enumerate}
	\item The magnitude of the gradient vector \(\grad f\) is the maximum rate of change of \(f(x, y)\).
	      \[
		      \abs{\grad f} = \max\limits_{\forall \theta} \left( \frac{\dd{f}}{\dd{s}} \right)
	      \]
	\item The direction of \(\grad f\) is the direction in which \(f\) increases most rapidly.
	      \[
		      \abs{\frac{\dd{f}}{\dd{s}}} = \abs{\grad f}\cos\theta
	      \]
	      where \(\theta\) is the angle between \(\grad f\) and \(\hat{\vb s}\), which follows from the definition of the directional derivative.
	\item If \(\dd \vb s\) (and \(\hat{\vb s}\)) are parallel to contours of \(f\), then
	      \[
		      \frac{\dd{f}}{\dd{s}} = \hat{\vb s} \cdot \grad f = 0
	      \]
	      Hence the gradient vector is perpendicular to contours of \(f\), and \(\abs{\grad f}\) is the slope in the `uphill' direction.
\end{enumerate}

\subsection{Stationary points}
In general, there is always at least one direction in which the directional derivative is zero, since we can just choose a direction perpendicular to the gradient vector, or equivalently parallel to contours of \(f\).
At stationary points, \(\frac{\dd{f}}{\dd{s}} = 0\) for all directions, so \(\grad f = \vb 0\).
Stationary points may have multiple types:
\begin{itemize}
	\item Minimum points, where the function is a minimum point in both directions;
	\item Maximum points, where the function is a maximum point in both directions; and
	\item Saddle points, where the function is a minimum point in one direction but a maximum point in another direction.
\end{itemize}
Note:
\begin{itemize}
	\item Near minima and maxima, the contours of \(f\) are elliptical.
	\item Near a saddle, the contours of \(f\) are hyperbolic.
	\item Contours of \(f\) can only cross at saddle points.
\end{itemize}

\subsection{Taylor series for multivariate functions}
Let us expand a function \(f(x, y)\) around a point \(\vb s_0\), and evaluate it at some point \(\vb s_0 + \delta \vb s\), where \(\delta \vb s = \delta s \hat{\vb s}\).
The Taylor series expansion in the direction of \(\hat{\vb s}\) is
\[
	f(s_0 + \delta s) = f(s_0) + \delta s \eval{\frac{\dd{f}}{\dd{s}} }_{s_0} + \frac{1}{2} (\delta s)^2\eval{\frac{\dd^2 f}{\dd{s}^2}}_{s_0} + \dots
\]
Further, by the definition of the directional derivative,
\[
	\frac{\dd}{\dd{s}} = \hat{\vb s}\cdot \grad
\]
Hence
\[
	\delta s \frac{\dd}{\dd{s}} = \delta \vb s \cdot \grad
\]
Now we can rewrite this Taylor series as follows:
\[
	f(s_0 + \delta_s) = f(s_0) + (\delta s)(\hat{\vb s} \cdot \grad) \eval{f}_{s_0} + \frac{1}{2}(\delta s)^2(\hat{\vb s} \cdot \grad)(\hat{\vb s} \cdot \grad)\eval{f}_{s_0} + \dots
\]
\[
	f(s_0 + \delta_s) = f(s_0) + \underbrace{(\delta \vb s \cdot \grad) \eval{f}_{s_0}}_{(1)} + \underbrace{\frac{1}{2}(\delta \vb s \cdot \grad)(\delta \vb s \cdot \grad)\eval{f}_{s_0}}_{(2)} + \dots
\]
Expressing this in Cartesian coordinates:
\[
	\vb s_0 = (x_0, y_0);\quad \delta \vb s = (\delta x, \delta y);\quad x=x_0+\delta x;\quad y=y_0+\delta_y
\]
Therefore,
\[
	(1) = \delta x \frac{\partial f}{\partial x}(x_0, y_0) + \delta y \frac{\partial f}{\partial y}(x_0, y_0)
\]
\begin{align*}
	(2) & = \eval{\frac{1}{2}\left( \delta x \frac{\partial}{\partial x} + \delta y + \frac{\partial}{\partial y} \right)\left( \delta x \frac{\partial}{\partial x} + \delta y + \frac{\partial}{\partial y} \right)f}_{x_0, y_0} \\
	    & = \eval{\frac{1}{2}\left( \delta x^2 f_{xx} + \delta x \delta y f_{yx} + \delta y \delta x f_{xy} + \delta y^2 f_{yy} \right)}_{x_0, y_0}                                                                                \\
	    & = \frac{1}{2}\begin{pmatrix}
		\delta x & \delta y
	\end{pmatrix} \eval{\begin{pmatrix}
			f_{xx} & f_{xy} \\
			f_{yx} & f_{yy}
		\end{pmatrix}}_{x_0, y_0} \begin{pmatrix}
		\delta x \\ \delta y
	\end{pmatrix}
\end{align*}

The matrix
\[
	H = \begin{pmatrix}
		f_{xx} & f_{xy} \\
		f_{yx} & f_{yy}
	\end{pmatrix} = \grad(\grad f)
\]
as used in the second derivative above, is called the Hessian matrix.

Putting this together, in 2D Cartesian Coordinates, we have
\begin{align*}
	f(x, y) & = f(x_0, y_0) + (x-x_0)\eval{f_x}_{x_0, y_0} + (y-y_0)\eval{f_y}_{x_0, y_0} \\&+ \frac{1}{2}\left[ (x-x_0)^2\eval{f_{xx}}_{x_0, y_0} + (y-y_0)^2\eval{f_{yy}}_{x_0, y_0} + 2(x-x_0)(y-y_0)\eval{f_{xy}}_{x_0, y_0} \right] + \dots
\end{align*}
And in the general coordinate-independent form:
\[
	f(\vb x) = f(\vb x_0) + \delta \vb x \cdot \grad f(\vb x_0) + \frac{1}{2}\delta \vb x \cdot \eval{[\grad (\grad f)]}_{x_0} \cdot \delta \vb x^\transpose + \dots
\]

\subsection{Classifying stationary points}
Since \(\grad f = \vb 0\) defines a stationary point, the Taylor series expansion around a stationary point \(\vb x = \vb x_s\) is
\[
	f(\vb x) \approx f(\vb x_s) + \frac{1}{2}\delta \vb x \cdot \eval{H}_{\vb x_s} \cdot \delta \vb x^\transpose
\]
So the nature of the stationary point depends on the Hessian matrix \(H\).
Consider a function in \(n\)-dimensional space
\[
	f = f(x_1, x_2, \dots, x_n)
\]
Then the \(n\)-dimensional Hessian matrix is given by
\[
	H = \begin{pmatrix}
		f_{x_1 x_1} & f_{x_1 x_2} & \cdots & f_{x_1 x_n} \\
		f_{x_2 x_1} & f_{x_2 x_2} & \cdots & f_{x_2 x_n} \\
		\vdots      & \vdots      & \ddots & \vdots      \\
		f_{x_n x_1} & f_{x_n x_2} & \cdots & f_{x_n x_n}
	\end{pmatrix}
\]
If all of these derivatives are defined, \(f_{x_1x_2} = f_{x_2x_1}\) etc, so \(H = H^\transpose\), i.e.\ \(H\) is symmetric, and therefore it can be diagonalised with respect to its principal axes.
\[
	\delta \vb x \cdot H \cdot \delta \vb x^\transpose = \begin{pmatrix}
		\delta x_1 & \delta x_2 & \cdots & \delta x_n
	\end{pmatrix} \begin{pmatrix}
		\lambda_1 & 0         & \cdots & 0         \\
		0         & \lambda_2 & \cdots & 0         \\
		\vdots    & \vdots    & \ddots & \vdots    \\
		0         & 0         & \cdots & \lambda_n
	\end{pmatrix} \begin{pmatrix}
		\delta x_1 \\ \delta x_2 \\ \cdots \\ \delta x_n
	\end{pmatrix}
\]
where the \(\lambda_i\) are eigenvalues of \(H\) and the \(\delta x_i\) is the displacement along the principal axis (eigenvector) \(i\).
Therefore
\[
	\delta \vb x \cdot H \cdot \delta \vb x^\transpose = \lambda_1 \delta x_1^2 + \lambda_2 \delta x_2^2 + \dots + \lambda_n \delta x_n^2
\]
\begin{enumerate}
	\item At a minimum point, \(\delta \vb x \cdot H \cdot \delta \vb x^\transpose > 0\) for any \(\delta \vb x\) (moving in any direction, we go `downhill').
	      So all the \(\lambda_i > 0\).
	      So \(H\) is positive definite.
	\item At a maximum point, \(\delta \vb x \cdot H \cdot \delta \vb x^\transpose < 0\) for any \(\delta \vb x\).
	      So all the \(\lambda_i < 0\).
	      \(H\) is negative definite.
	\item At a saddle point, \(H\) is indefinite.
\end{enumerate}

\subsection{Signature of Hessian}
\begin{definition}
	The signature of \(H\) is the pattern of the signs of its subdeterminants.
\end{definition}
For a function \(f(x_1, x_2, \dots, x_n)\), we want the signs of
\[
	\underbrace{\abs{f_{x_1x_1}}}_{\abs{H_1}}, \underbrace{\begin{vmatrix}
			f_{x_1 x_1} & f_{x_1 x_2} \\
			f_{x_2 x_1} & f_{x_2 x_2}
		\end{vmatrix}}_{\abs{H_2}},\dots, \underbrace{\begin{vmatrix}
			f_{x_1 x_1} & f_{x_1 x_2} & \cdots & f_{x_1 x_n} \\
			f_{x_2 x_1} & f_{x_2 x_2} & \cdots & f_{x_2 x_n} \\
			\vdots      & \vdots      & \ddots & \vdots      \\
			f_{x_n x_1} & f_{x_n x_2} & \cdots & f_{x_n x_n}
		\end{vmatrix}}_{\abs{H_1}}
\]
We know from Vectors and Matrices that if a symmetric matrix \(H\) is positive (or negative) definite, then \(H_1, H_2, \dots, H_{n-1}\) are positive (or negative) definite.
This is known as Sylvester's Criterion.
In other words, a minimum (or maximum) point in \(n\)-dimensional space is also a minimum (or maximum) in any subspace containing this point.
Now let us list the signs of subdeterminants to see the types of signatures.
\begin{enumerate}
	\item At a minimum point (\(\lambda_i > 0\)), the signature is \(+, +, +, +, \dots\)
	\item At a maximum point (\(\lambda_i < 0\)), the signature is \(-, +, -, +, \dots\)
\end{enumerate}
If \(\abs{H} = 0\), we need higher order terms in the Taylor series.

\subsection{Contours near stationary points}
Consider a coordinate system aligned with the principal axes of the Hessian \(H\) in two-dimen\-sional space, so
\[
	H = \begin{pmatrix}
		\lambda_1 & 0 \\ 0 & \lambda_2
	\end{pmatrix}
\]
Let \(\delta \vb x = (\vb x - \vb x_s) = (\xi, \eta)\) where \(\vb x_s\) is the stationary point we're considering.
In a small region near \(\vb x_s\), the contours of \(f\) satisfy
\[
	f = \text{constant (since \(f\) is a contour)} \approx f(\vb x_s) = \frac{1}{2}\delta \vb x \cdot H \cdot \delta \vb x^\transpose
\]
\begin{equation}\label{contourhessian}
	\therefore\ \lambda_1 \xi^2 + \lambda_2 \eta^2 \approx \text{constant}
\end{equation}
Near a minimum or maximum point, \(\lambda_1\) and \(\lambda_2\) have the same sign.
\eqref{contourhessian} implies that the contours of \(f\) are elliptical.
Near a saddle point, \(\lambda_1\) and \(\lambda_2\) have opposite sign so \eqref{contourhessian} shows that the contours of \(f\) are hyperbolic.
As an example, let us consider
\[
	f(x,y) = 4x^3 - 12xy + y^2 + 10y + 6
\]
Let us first identify the stationary points.
\[
	f_x = f_y = 0
\]
After solving this, we get
\[
	(x,y) = (1, 1), (5, 25)
\]
To get the Hessian matrix:
\begin{align*}
	f_{xx}          & = 24x \\
	f_{xy} = f_{yx} & = -12 \\
	f_{yy}          & = 2
\end{align*}
Now considering the stationary points separately:
\begin{itemize}
	\item \((1, 1)\):
	      \[
		      H = \begin{pmatrix}
			      24 & -12 \\ -12 & 2
		      \end{pmatrix} \implies \abs{H_1} = 24;\quad \abs{H} = 48-144
	      \]
	      The signature is \(+, -\), so this is a saddle point.
	\item \((5, 25)\):
	      \[
		      H = \begin{pmatrix}
			      120 & -12 \\ -12 & 2
		      \end{pmatrix} \implies \abs{H_1} = 120;\quad \abs{H} = 240-144
	      \]
	      The signature is \(+, +\), so this is a minimum point.
\end{itemize}
